207 research outputs found

    Faster Lead Optimization Mapper Algorithm for Large-Scale Relative Free Energy Perturbation

    Full text link
    In recent years, free energy perturbation (FEP) calculations have garnered increasing attention as tools to support drug discovery. The lead optimization mapper (Lomap) was proposed as an algorithm to calculate the relative free energy between ligands efficiently. However, Lomap requires checking whether each edge in the FEP graph is removable, which necessitates checking the constraints for all edges. Consequently, conventional Lomap requires significant computation time, at least several hours for cases involving hundreds of compounds, and is impractical for cases with more than tens of thousands of edges. In this study, we aimed to reduce the computational cost of Lomap to enable the construction of FEP graphs for hundreds of compounds. We can reduce the overall number of constraint checks required from an amount dependent on the number of edges to one dependent on the number of nodes by using the chunk check process to check the constraints for as many edges as possible simultaneously. Moreover, the output graph is equivalent to that obtained using conventional Lomap, enabling direct replacement of the original Lomap with our method. With our improvement, the execution was tens to hundreds of times faster than that of the original Lomap. https://github.com/ohuelab/FastLoma

    Enhancing Model Learning and Interpretation Using Multiple Molecular Graph Representations for Compound Property and Activity Prediction

    Full text link
    Graph neural networks (GNNs) demonstrate great performance in compound property and activity prediction due to their capability to efficiently learn complex molecular graph structures. However, two main limitations persist including compound representation and model interpretability. While atom-level molecular graph representations are commonly used because of their ability to capture natural topology, they may not fully express important substructures or functional groups which significantly influence molecular properties. Consequently, recent research proposes alternative representations employing reduction techniques to integrate higher-level information and leverages both representations for model learning. However, there is still a lack of study about different molecular graph representations on model learning and interpretation. Interpretability is also crucial for drug discovery as it can offer chemical insights and inspiration for optimization. Numerous studies attempt to include model interpretation to explain the rationale behind predictions, but most of them focus solely on individual prediction with little analysis of the interpretation on different molecular graph representations. This research introduces multiple molecular graph representations that incorporate higher-level information and investigates their effects on model learning and interpretation from diverse perspectives. The results indicate that combining atom graph representation with reduced molecular graph representation can yield promising model performance. Furthermore, the interpretation results can provide significant features and potential substructures consistently aligning with background knowledge. These multiple molecular graph representations and interpretation analysis can bolster model comprehension and facilitate relevant applications in drug discovery

    MEGADOCK 3.0: a high-performance protein-protein interaction prediction software using hybrid parallel computing for petascale supercomputing environments

    Get PDF
    BACKGROUND: Protein-protein interaction (PPI) plays a core role in cellular functions. Massively parallel supercomputing systems have been actively developed over the past few years, which enable large-scale biological problems to be solved, such as PPI network prediction based on tertiary structures. RESULTS: We have developed a high throughput and ultra-fast PPI prediction system based on rigid docking, “MEGADOCK”, by employing a hybrid parallelization (MPI/OpenMP) technique assuming usages on massively parallel supercomputing systems. MEGADOCK displays significantly faster processing speed in the rigid-body docking process that leads to full utilization of protein tertiary structural data for large-scale and network-level problems in systems biology. Moreover, the system was scalable as shown by measurements carried out on two supercomputing environments. We then conducted prediction of biological PPI networks using the post-docking analysis. CONCLUSIONS: We present a new protein-protein docking engine aimed at exhaustive docking of mega-order numbers of protein pairs. The system was shown to be scalable by running on thousands of nodes. The software package is available at: http://www.bi.cs.titech.ac.jp/megadock/k/

    A prospective compound screening contest identified broader inhibitors for Sirtuin 1

    Get PDF
    Potential inhibitors of a target biomolecule, NAD-dependent deacetylase Sirtuin 1, were identified by a contest-based approach, in which participants were asked to propose a prioritized list of 400 compounds from a designated compound library containing 2.5 million compounds using in silico methods and scoring. Our aim was to identify target enzyme inhibitors and to benchmark computer-aided drug discovery methods under the same experimental conditions. Collecting compound lists derived from various methods is advantageous for aggregating compounds with structurally diversified properties compared with the use of a single method. The inhibitory action on Sirtuin 1 of approximately half of the proposed compounds was experimentally accessed. Ultimately, seven structurally diverse compounds were identified

    MEGADOCK-on-Colab: an easy-to-use protein–protein docking tool on Google Colaboratory

    No full text
    Abstract Motivation Since the advent of ColabFold, numerous software packages have been provided with Google Colaboratory-compatible ipynb files, allowing users to effortlessly test and reproduce results without the need for local installation or configuration. MEGADOCK, a protein–protein docking tool, is particularly well-suited for Google Colaboratory due to its lightweight computations and GPU acceleration capabilities. To increase accessibility and promote widespread use, it is crucial to provide a computing environment compatible with Google Colaboratory. Results In this study, we report the development of a Google Colaboratory environment for running our protein–protein docking software, MEGADOCK. We provide a comprehensive ipynb file, including the compilation of MEGADOCK with the FFTW library installation on Colaboratory, the introduction of related tools using PyPI/apt, and the execution and visualization of docking structures. This streamlined environment enables users to visualize docking structures with just one click. The code is available under a CC-BY NC 4.0 license from https://github.com/ohuelab/MEGADOCK-on-Colab

    Drug-target affinity prediction using applicability domain based on data density

    No full text
    In the pursuit of research and development of drug discovery, the computational prediction of the target affinity of a drug candidate is useful for screening compounds at an early stage and for verifying the binding potential to an unknown target. The chemogenomics-based method has attracted increased attention as it integrates information pertaining to the drug and target to predict drug-target affinity (DTA). However, the compound and target spaces are vast, and without sufficient training data, proper DTA prediction is not possible. If a DTA prediction is made in this situation, it will potentially lead to false predictions. In this study, we propose a DTA prediction method that can advise whether/when there are insufficient samples in the compound/target spaces based on the concept of the applicability domain (AD) and the data density of the training dataset. AD indicates a data region in which a machine learning model can make reliable predictions. By preclassifying the samples to be predicted by the constructed AD into those within (In-AD) and those outside the AD (Out-AD), we can determine whether a reasonable prediction can be made for these samples. The results of the evaluation experiments based on the use of three different public datasets showed that the AD constructed by the k-nearest neighbor (k-NN) method worked well, i.e., the prediction accuracy of the samples classified by the AD as Out-AD was low, while the prediction accuracy of the samples classified by the AD as In-AD was high
    • …
    corecore